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MicroRNAs are essential post- 
transcriptional regulators. Many 
animal microRNAs are clustered in the 
genome, and it has been shown that 
clustered microRNAs may be transcribed 
as a single transcript. Polycistronic 
microRNAs are often members of the 
same family, suggesting a role of tandem 
duplication in the emergence of clusters. 
The mir-2 microRNA family is the 
largest in Drosophila melanogaster, with 
8 members that are mostly clustered in 
the genome. Previous studies suggest that 
the copy number and genomic distri- 
bution of mir-2 family members has 
been subject to significant change during 
evolution. The effects of such changes 
on their function are still unknown. Here 
we study the evolution of function in the 
mir-2 family. Our analyses show that, 
in spite of the change in number and 
organization among invertebrates, most 
mir-2 loci produce very similar mature 
microRNA products. Multiple mature 
miR-2 sequences are predicted to target 
genes involved in neural development 
in Drosophila. These targeting properties 
are conserved in the distant species 
Caenorhabditis elegans. Duplication 
followed by functional diversification is 
frequent during protein-coding gene 
evolution. However, our results suggest 
that the production of microRNA clusters 
by gene duplication rarely involves func- 
tional changes. This pattern of functional 
redundancy among clustered paralogous 
microRNAs reflects birth-and-death 
evolutionary dynamics. However, we 
identified a small number of mir-2 
sequences in Drosophila that may have 
undergone functional shifts associated 
with genomic rearrangements. There- 
fore, redundancy in microRNA families 
may facilitate the acquisition of novel 
functional features. 



Introduction 

MicroRNAs, crucial regulators of gene 
expression at the post-transcriptional 
level, are often clustered in the genome. 1 
According to miRBase, 2 more than a 
quarter of both Drosophila and human 
microRNAs are less than 10 kb away 
from other microRNAs. These clustered 
microRNAs are often co-expressed, sug- 
gesting that they are produced from a 
single transcript. 3 ' 6 The majority of 
microRNA clusters contain members of 
the same family, indicating a major role 
of tandem duplication in cluster forma- 
tion. 7 " 9 In the case of protein-coding 
genes, duplication is acknowledged as the 
main source of functional innovation, 
since duplicates are free to diversify in 
their functions. 10 Similarly, duplicated 
microRNAs may acquire new targets and 
therefore novel functions. However, 
microRNAs processed from the same 
transcript are linked by their expression 
pattern, imposing a functional constraint 
on their evolutionary diversification. 
Whether microRNA tandem duplications 
facilitate the emergence of new functions 
or generate redundant products remains 
to be explored. 

Mir-2 is the largest microRNA family 
in Drosophila melanogaster and one of the 
first to be discovered. 11 " 13 The mir-2 
family has 8 members in the D. melano- 
gaster genome (mir-2a-l, mir-2a-2, mir- 
2b-l, mir-2b-2, mir-2c, mir-13a, mir-13b-l 
and mir-13b-2), six of which are organized 
in two clusters. 14 In most other studied 
insects, there are five mir-2 sequences 
encoded by a single transcript (see ref. 15 
and references therein). Caenorhabditis 
elegans has only one mir-2 sequence. 12,13 

Here we study the mir-2 family to 
investigate the impact of microRNA family 
expansions on functional diversification. 
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We combine comparative genomics with 
expression data analyses and functional 
annotation of predicted targets to com- 
pare the functional features of mir-2 
sequences. Our results will help us to 
understand the role of tandem microRNA 
duplications in the evolution of gene 
regulation. 

Results 

Mir-2 is a conserved microRNA family in 
invertebrates. In order to characterize mir-2 
family members, we performed comprehen- 
sive sequence similarity searches against 
multiple sequenced organisms (see Materials 
and Methods). We detected mir-2 hairpin 
precursor sequences in many invertebrates 
(Fig. 1A; File SI) but none in vertebrate 
species. The 3 ' arm of the hairpin is 
highly conserved, although the many 
changes in the 5' arm are fully consistent 
with the precursor hairpin structure 
(Fig. 1A). The gene copy number is 
highly variable among species (from one 
in C. elegans to eight in D. melanogaster) 



suggesting that the mir-2 content of each 
lineage is the product of multiple birth- 
and-death events. 

Since mir-2 sequences are short and 
very similar, their genomic contexts can 
improve our ability to annotate and 
explore their evolutionary origins. The 
genomic organization of mir-2 family 
members across phyla (Fig. 2) suggests 
that the ancestral mir-2 microRNA was 
clustered with mir-71, an evolutionarily 
unrelated microRNA. Mir-7 1 itself is 
found in protostomes, but also in cepha- 
lochordates, hemichordates and echino- 
derms. 16,17 The origin of mir-71 therefore 
pre-dates the split of protostomes and 
deuterostomes, although it has been lost 
in chordates. Mir-2 arose later, most likely 
before the last common ancestor of pro- 
tostomes. Although mir-71 and mir-2 are 
still linked in most species, mir-71 has 
been lost independently in two dipteran 
lineages . The expansion of the mir-2 
family by tandem duplication and deletion 
has generated mir-2 clusters of different 
lengths in different species. The mir-13 



subfamily has a conserved characteristic 
one-nucleotide deletion in its 3' arm 
(Fig. 1A), indicating that these sequences 
originated from duplicated mir-2 locus in 
the common ancestor of insects. Com- 
bined analysis of sequence conservation 
and cluster structure (Figs. 1A and 2) 
suggests that the ancestral insect cluster 
split in two in the Drosophila lineage, with 
subsequent additional duplications. As a 
consequence, different mir-2 copies in 
Drosophila are under the transcriptional 
control of different regulatory sequences. 

Functional conservation and redund- 
ancy of mir-2 products. The pattern of 
sequence conservation in the mir-2 family 
sequences shown in Figure 1 suggests that 
the dominant mature microRNA is pro- 
duced from the 3' arm of mir-2 precursors. 
Our re-analysis of deep-sequencing data 
from D. melanogaster, Tribolium castaneum 
and C. elegans confirms that the 3' arm 
is highly expressed compared with the 
5' arm in most mir-2 family members 
(FileS2). Deep-sequencing analyses from 
honeybee and silkworm also reveal the 
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Figure 1 . Sequence conservation in the mir-2 family. (A) The alignment of mir-2 precursor sequences in representative genomes, shadowed by sequence 
conservation (visualized using Ralee 42 ), where darker tones reflect higher conservation. Structure of the consensus sequence is shown below 
the alignment in dot-bracket annotation. The open white box over the alignment indicates the canonical mature product, with the seed sequence 
highlighted (black). (B) Consensus structure of the mir-2 precursor in invertebrates, colored with VARNA 43 according to sequence conservation. 
The canonical and non-canonical mature products produced by some mir-2 precursors are also indicated. 
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Figure 2. Copy distribution of mir-2 sequences. Phylogenetic tree of invertebrate species and 
genomic organization of mir-2 sequences. Divergence times were extracted from ref. 47. 
Black arrows depict mir-2 family members, and white arrow mir-71 sequences. Arrows linked by 
the same straight line indicate microRNAs linked in the genome by less than 10 kb. 



same expression pattern. 18,19 We observe 
that most mir-2 sequences conserve the 
location of the Drosha and Dicer cleavage 
sites. This position determines the first 
nucleotides of the microRNA, and hence 
the seed sequence. The seed is defined as 
nucleotides 2 to 7 of a mature microRNA, 



r 



and it is crucial for transcript targeting. 20 
Since sequence conservation is very high in 
the 3' arms, seed sequences are the same 
for all mir-2 family products in which the 
Dicer cleavage site is conserved (Fig. 1). 

Functional shifts in mir-2 products. 
Deep sequencing data from Drosophila 



suggest that the 3' arm of mir-2a produces 
two alternative mature products, in con- 
trast to the majority of mir-2 family 
members. Each accounts for a significant 
proportion of the reads produced by 
mir-2a loci (47% and 28%), and they 
are offset from one another by 2 nucleo- 
tides. The first of these products (the 
5 '-most) is processed identically to the 
conserved mature sequence produced 
from the majority of mir-2 family mem- 
bers, termed the 'canonical' product here 
(Fig. IB). The second is offset by 2 
nucleotides in the 3' direction, and 
is termed the 'non-canonical' product 
(Fig. IB). Both of these products map 
exactly to two alternative hairpin pre- 
cursors called mir-2a-l and mir-2a-2, 
suggesting that both products could 
potentially be made from either locus. 
However, the 5' arms of these two hairpins 
are not identical in sequence, and there- 
fore reads mapping to the 5' arms can be 
assigned to one or other hairpin. It has 
been previously reported that the charac- 
teristic pattern of two nucleotide overhang 
at the 3-prime ends of mature microRNA 
duplexes allows the assignment of reads 
from the 3' arm to one or other hairpin, 
even though the 3' arm sequences are 
identical. 14 ' 21 ' 22 This approach predicts 
that the non-canonical mature sequence, 
offset by 2 nucleotides, is produced over- 
whelmingly from the mir-2a-2 locus, 
whereas mir-2a-l is processed identically 
to the other mir-2 family members. 
Analysis of deep sequencing data from 
an RNA immunoprecipitation (RIP-seq) 
of Argonaute proteins shows that both 
canonical and non-canonical mature pro- 
ducts are loaded into the RNA induced 
silencing complex (RISC), 23 and are 
therefore likely to be functional. The seed 
sequences of canonical and non-canonical 
mature microRNAs are offset, and hence 
differ in sequence, suggesting that they 
regulate different targets. Drosophila 
mir-2c also produces an offset, non- 
canonical, mature product. However this 
microRNA is expressed at a very low level 
and is not found in the AGO RIP-seq 
data set. 22,23 Our data show that a signi- 
ficant fraction of non-canonical mir-2 
products are also expressed from mir-2 
loci in T. castaneum (File S2) and in the 
honeybee Apis mellifera (data not shown). 
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Table 1. Top 20 enriched GO terms in the developmental process category 



Species 


Enriched GO term 


# genes 1 


q-value 2 


Drosophila 


multicellular organismal development 


121 


0.0000 




nervous system development 


67 


0.0000 




central nervous system development 


25 


0.0000 




sensory organ development 


37 


0.0000 




anatomical structure morphogenesis 


91 


0.0000 




organ morphogenesis 


48 


0.0000 




neurogenesis 


54 


0.0000 




cell differentiation 


84 


0.0000 




neuron differentiation 


46 


0.0000 




developmental process 


129 


0.0000 




cell fate commitment 


31 


0.0000 




organ development 


79 


0.0000 




generation of neurons 


53 


0.0000 




system development 


106 


0.0000 




anatomical structure development 


123 


0.0000 




cellular developmental process 


85 


0.0000 




brain development 


16 


0.001 1 




eye development 


30 


0.0022 




neuron development 


38 


0.0023 




regional ization 


37 


0.0023 


Caenorhabditis 


cellular component morphogenesis 


38 


0.0047 




anatomical structure morphogenesis 


112 


0.0062 




neurogenesis 


22 


0.0164 




generation of neurons 


22 


0.0164 




neuron development 


20 


0.0165 




cell morphogenesis 


22 


0.0167 




neuron differentiation 


21 


0.0167 




muscle structure development 


23 


0.0187 




muscle organ development 


6 


0.0191 




nervous system development 


22 


0.0204 




neuron projection morphogenesis 


18 


0.0209 




organ morphogenesis 


10 


0.0226 




axonal fasciculation 


1 1 


0.0232 




neuron projection development 


18 


0.0233 




anatomical structure formation involved in morphogenesis 


25 


0.0238 




cell projection morphogenesis 


19 


0.0245 




syncytium formation by plasma membrane fusion 


3 


0.0334 




syncytium formation 


3 


0.0334 




cell part morphogenesis 


19 


0.0347 




neuron recognition 


11 


0.0356 



dumber of genes with predicted canonical seed targets (see Methods) annotated to a GO term; 
2 q-value is the p-value corrected for a false discovery rate of 0.05 (ref. 46) 



However, the strategy described above 
cannot be applied to assign reads to a 
single locus. 

Unlike other mir-2 members, the mir- 
2a-2 precursor produces approximately 
equal amounts of mature sequences from 
each arm of the hairpin. 21,22 Nevertheless, 
mature sequences derived from the 5' arm 
are not observed in AGO RIP-seq experi- 
ments 22,23 and are not, therefore, predicted 
to be loaded into the RISC complex. This 
further supports a dominant role of the 
mature sequence from the 3' arm across 
the mir-2 family. 

Mir-2 products are likely to target 
neural genes. We have shown that mature 
products from mir-2 loci are highly con- 
served and are likely to have the same 
targeting properties. Do mir-2 sequences 
therefore conserve their targets throughout 
evolution? We address this question by 
comparing the targets of D. melanogaster 
and C. elegans miR-2 mature sequences. 
We used the canonical seed method 20 to 
predict transcripts whose 3'UTR are 
targeted by all miR-2 family members 
in D. melanogaster and the only miR-2 
sequence in C. elegans (see Methods). All 
but two miR-2 sequences in Drosophila 
have identical seeds and therefore identi- 
cal predicted target sets (Fig. lA).The 
two microRNAs with different targets 
were miR-2a-2 and miR-2c, which are 
offset with respect to the canonical mir-2 
products (Fig. IB). 

We mapped Gene Ontology terms to 
the predicted targets of miR-2 family 
members, and analyzed the set of terms 
that were statistically enriched in the 
targeted gene set (see Materials and 
Methods). We focused on terms within 
the 'Developmental process' category, 
which is particularly informative for 
development and tissue specificity. 24 We 
detected 675 genes targeted by Drosophila 
the miR-2 canonical sequence, and 979 
for the functional Caenorhabditis miR-2 
product. For both Drosophila and Caenor- 
habditis, we observed an enrichment in 
genes involved in neural development 
(Table 1). We therefore predict a role for 
mir-2 in neural function. Indeed, expres- 
sion data from deep-sequencing analyses 
in Drosophila indicate that mir-2 pro- 
ducts are highly expressed in adult heads. 14 
We also studied the targets of the non- 



canonical products from miR-2a-2 and 
miR-2c in Drosophila. Both miR-2a-2 
and miR-2c are predicted to target 286 
genes. In these cases, we did not find any 



significantly enriched functional classes 
(not shown). 

The seed model for microRNA targets 
predicts that offset mature products from 
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the mir-2a-l and mir-2a-2 loci will target 
different sites. However, it is well estab- 
lished that sequence complementarity 
outside the seed motif is important (and 
perhaps even sometimes sufficient) for 
target recognition (reviewed in ref. 25). 
To explore whether offset microRNAs 
with the same nucleotide sequence may 
have different targeting properties, we 
predicted targets with a different tool, 
miRanda, which places less weight on the 
microRNA seed and accounts more fully 
for the hybridization energy between the 
microRNA and the target. 26 In this 
particular case, Drosophila miR-2a-l is 
predicted to target 553 transcripts, and 
miR-2a-2 putatively binds to 788, with 
368 targeted genes common to both. The 
overlap of target genes is greater than 
expected by chance (p = 0.008, see 
Materials and Methods). This suggests 
that, although the seed shifting between 
miR-2a-l and miR-2a-2 may induce 
functional changes, the two microRNAs 
likely conserve partially redundant target- 
ing properties. 

Discussion 

The evolutionary history of microRNA 
families is characterized by frequent dupli- 
cations, losses and rearrangements. 7 ' 8 ' 27 ' 28 
Here we describe the evolution of the 
largest conserved insect microRNA family: 
mir-2. We showed that this family is 
widely represented in invertebrates, and 
the copy number and genomic distribu- 
tion varies greatly between species. Deep- 
sequencing data reveal that all mir-2 family 
members produce their dominant mature 
microRNAs from the 3' arm, whose 
sequence is highly conserved (Fig. 1). 
Moreover, most mir-2 precursors have 
the same Dicer cleavage site, thus produc- 
ing functional mature miR-2 sequences 
with the same seed region and predicted 
targets. According to the available deep- 
sequencing data, most mir-2 loci within 
the same species produce redundant pro- 
ducts. In Drosophila, antisense-mediated 
inactivation of mir-2 sequences shows 
that multiple mir-2 loci have similar (if 
not identical) functions. 29 ' 30 

It is well-established that pairs of 
protein-coding loci resulting from gene 
duplication rapidly diverge in their 



sequence and/or expression pattern, since 
functional redundancy is generally a 
transient situation. 10 Duplication has 
been also proposed as a mechanism of 
microRNA functional diversification, 14 
although there is no direct evidence of 
this pattern so far. The mir-2 family 
suggests that microRNA families may 
tolerate a situation of functional redund- 
ancy in the longer term, as multiple 
almost identical copies are present in 
each invertebrate genome. One possible 
explanation is that mir-2 products are 
required at high levels and local tandem 
duplications produce a net increase in 
the expression level. This is supported by 
a previous observation that increased 
expression levels are associated with an 
increase in microRNA copy number. 31 
On the other hand, the presence of 
redundant mir-2 paralogs might reflect 
essentiality (see discussion in ref. 29). 
Functional redundancy in clustered para- 
logous microRNAs has been previously 
reported, and may simply reflect high 
turnover and birth-and-death evolutionary 
dynamics. 8 ' 27,28 These processes will gener- 
ate clusters of very similar sequences, and 
account for the copy number differences 
between different species. 32 The data 
strongly suggest that mir-2 family evolu- 
tion is dominated by high turnover and 
birth-and-death dynamics mostly driven 
by random drift. 

Clustered paralogous microRNAs are 
evolutionarily constrained since their 
expression pattern is linked. However, 
mir-2 family members in the Drosophila 
genus are located in two clusters and 
two single loci. This decoupling of their 
regulatory sequences may have facilitated 
functional changes. Indeed, we observe 
that the identical 3' arms of the mir-2a-l 
and mir-2a-2 hairpin precursors produce 
different offset mature sequences, which 
we call here canonical and non-canonical 
miR-2 products (Fig. IB). This phenome- 
non is called "seed shifting," and has been 
described to induce functional changes 
between orthologous microRNAs. 15 ' 17 
Experiments in Drosophila suggest that 
mir-2 products are expressed in brain and 
have (at least partially) redundant func- 
tions. 29 ' 30 However, in situ hybridizations 
show that the three clusters mir-2b-2-mir- 
2a-2-mir-2a-l , mir-2c-mir- 1 3a-mir- 1 3b- 1 



and mir-13b-2 have different spatial 
expression patterns during early develop- 
ment. 33 We suggest that genomic reorga- 
nizations breaking the linkage between 
mir-2 loci in Drosophila triggered a sub- 
functionalization event. 3 Interestingly, 
in the flatworm Schistosoma mansoni we 
observe a duplication of the entire ancest- 
ral mir-2 cluster (Fig. 2 and ref. 35). The 
functional analysis of the mir-2 family in 
this parasitic species might shed light on 
the evolutionary dynamics of clustered 
microRNAs. 

Mir-2 loci are highly expressed in adult 
heads in Drosophila 22 and in neurons in 
Caenorhabditis. iG We show that the pre- 
dicted targets of mir-2 microRNAs in 
both Drosophila and Caenorbabditis are 
significantly enriched for transcripts with 
neural development functions (Table 1). 
Mir-2 has also been found to be highly 
expressed in heads of Bombyx mori? 7 
Antisense-mediated inactivation of mir-2 
in Drosophila produces embryos with 
defects in head and posterior abdominal 
segments. 30 Mir-2 has been shown to 
specifically target the pro-apoptotic genes 
rpr, grim and ski. 29 Strikingly, these three 
genes are involved in the selective death 
by apoptosis of neuroblasts during the 
normal development of the nervous sys- 
tem. 38 By targeting these pro-apoptotic 
genes, mir-2 can act as an anti-apoptotic 
factor in neurons. Indeed, the repression of 
rpr and grim by ABD-B prevent apoptosis 
in neural cells. 39 In the light of these data, 
we speculate that mir-2 microRNAs have 
a fundamental role in neuron survival 
during development and adulthood. 

Finally, we note that early works asso- 
ciate mir-6 and mir-11 sequences with the 
mir-2 family because they have identical 
(or very similar) seed sequences (e.g., 
ref. 29). However, there is no evidence 
of an evolutionary relationship between 
these three families. Moreover, mir-6 and 
mir-11 have a distinct expression pattern 
from mir-2, so functional overlap among 
these families is unlikely. 29 ' 30 ' 33 We 
strongly encourage the use of the family 
name mir-2 to represent only mir-2/ 
mir-13 sequences. 

In summary, the mir-2 family is 
an invertebrate-specific family of micro- 
RNAs probably involved in neural deve- 
lopment and maintenance. The number 
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and genomic organization of mir-2 loci 
varies greatly between species, although 
the function of paralogous microRNAs is 
most often redundant. The retention of 
redundant sequences may be facilitated 
by the co-transcription of clustered micro- 
RNAs. In Drosophila, the ancestral mir-2 
cluster has split into multiple independent 
transcripts, decoupling the transcriptional 
regulation among mir-2 loci. In this species 
we find evidence of potential functional 
shifts of some mir-2 family members. 

Materials and Methods 

We retrieved all mir-2 precursor sequences 
from miRBase 2 (version 17) and used 
BLAST 40 (w = 4, r = 2, q = -3) to search 
for homologous sequences in multiple 
genomes from NCBI (www.ncbi.nlm. 
nih.gov/genome): Drosophila melanogaster, 
D. virilis, D. willistoni. D. pseudoobscura, 
Aedes aegypti, Anopheles gambiae, Acyrtho- 
siphon pisum, Bombyx mori, Apis mellifera, 
Tribolium castaneum, Capitella teleta, 
Daphnia pulex, Caenorhabditis elegans, 
Gallus gallus, Mus musculus and Homo 
sapiens. We aligned the putative micro- 
RNA hairpin sequences with CMfinder 41 
(n = 5, m = 30, M = 100), chose the output 
alignment that best reflects the microRNA 
hairpin pairing, and manually refined the 
alignment using RALEE. 42 The consensus 
sequence of the alignment was built by 
taking the most abundant base for each 
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